geometric median
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (3 more...)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)
- (9 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers
Fang, Ziyi, Huang, Lingxiao, Yang, Runkai
We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tildeΘ(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (20 more...)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (7 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (7 more...)
One-shot Robust Federated Learning of Independent Component Analysis
Jin, Dian, Bing, Xin, Zhang, Yuqian
This paper investigates a general robust one-shot aggregation framework for distributed and federated Independent Component Analysis (ICA) problem. We propose a geometric median-based aggregation algorithm that leverages $k$-means clustering to resolve the permutation ambiguity in local client estimations. Our method first performs k-means to partition client-provided estimators into clusters and then aggregates estimators within each cluster using the geometric median. This approach provably remains effective even in highly heterogeneous scenarios where at most half of the clients can observe only a minimal number of samples. The key theoretical contribution lies in the combined analysis of the geometric median's error bound-aided by sample quantiles-and the maximum misclustering rates of the aforementioned solution of $k$-means. The effectiveness of the proposed approach is further supported by simulation studies conducted under various heterogeneous settings.
- Research Report (1.00)
- Overview (1.00)